{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Classification\n",
    "\n",
    "The goal of classification is to find a function that separates the data into positive/negative labels. In the case of a linear classifier, this reduces to finding a set of parameters $w^\\star$ such that, $$ \\begin{align} w^\\star &= \\arg \\min_w \\sum_{i=1}^{N} \\left[y_i\\neq \\text{sign} (w^\\top x_i) \\right] \\\\ &= \\arg \\min_w \\sum_{i=1}^{N} l_{0/1} (w; x_i, y_i) \\end{align}.$$ \n",
    "\n",
    "The problem with the $l_{0/1}$ loss, is that it is non-convex (and non-differentiable), hence other surrogate losses must be used to optimize the number of points. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "%load_ext autoreload\n",
    "%autoreload 2\n",
    "import ipywidgets\n",
    "from ipywidgets import interact, interactive, interact_manual\n",
    "import IPython\n",
    "from matplotlib import rcParams\n",
    "rcParams['figure.figsize'] = (10, 5)\n",
    "rcParams['font.size'] = 16\n",
    "\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "from utilities.util import gradient_descent\n",
    "from utilities.load_data import linear_separable_data, circular_separable_data\n",
    "from utilities import plot_helpers \n",
    "from utilities.classifiers import Perceptron, SVM, Logistic\n",
    "from utilities.regularizers import L1Regularizer, L2Regularizer"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "rcParams['figure.figsize'] = (10, 5)\n",
    "rcParams['font.size'] = 16\n",
    "\n",
    "num_points = 100  # Number of points per class\n",
    "noise = 0.2  # Noise Level (needed for data generation).\n",
    "TEST_FRACTION = .80\n",
    "np.random.seed(42)\n",
    "X, Y = linear_separable_data(num_points, noise=noise, dim=2)\n",
    "\n",
    "fig = plt.subplot(111)\n",
    "opt = {'marker': 'ro', 'label': '+', 'size': 8}\n",
    "plot_helpers.plot_data(X[np.where(Y == 1)[0], 0], X[np.where(Y == 1)[0], 1], fig=fig, options=opt)\n",
    "opt = {'marker': 'bs', 'label': '-', 'x_label': '$x$', 'y_label': '$y$', 'size': 8, 'legend': True}\n",
    "plot_helpers.plot_data(X[np.where(Y == -1)[0], 0], X[np.where(Y == -1)[0], 1], fig=fig, options=opt)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "rcParams['figure.figsize'] = (10, 5)\n",
    "rcParams['font.size'] = 16\n",
    "\n",
    "# Separate into train and test sets!\n",
    "indexes = np.arange(0, 2 * num_points, 1)\n",
    "np.random.shuffle(indexes)\n",
    "num_train = int(np.ceil(2 * TEST_FRACTION * num_points))\n",
    "\n",
    "X_train = X[indexes[:num_train]]\n",
    "Y_train = Y[indexes[:num_train]]\n",
    "\n",
    "X_test = X[indexes[num_train:]]\n",
    "Y_test = Y[indexes[num_train:]]\n",
    "\n",
    "fig = plt.subplot(111)\n",
    "\n",
    "opt = {'marker': 'ro', 'fillstyle': 'full', 'label': '+ Train', 'size': 8}\n",
    "plot_helpers.plot_data(X_train[np.where(Y_train == 1)[0], 0], X_train[np.where(Y_train == 1)[0], 1], fig=fig, options=opt)\n",
    "opt = {'marker': 'bs', 'fillstyle': 'full', 'label': '- Train', 'size': 8}\n",
    "plot_helpers.plot_data(X_train[np.where(Y_train == -1)[0], 0], X_train[np.where(Y_train == -1)[0], 1], fig=fig, options=opt)\n",
    "\n",
    "opt = {'marker': 'ro', 'fillstyle': 'none', 'label': '+ Test', 'size': 8}\n",
    "plot_helpers.plot_data(X_test[np.where(Y_test == 1)[0], 0], X_test[np.where(Y_test == 1)[0], 1], fig=fig, options=opt)\n",
    "opt = {'marker': 'bs', 'fillstyle': 'none', 'label': '- Test', 'size': 8, \n",
    "       'x_label': '$x$', 'y_label': '$y$', 'legend': True}\n",
    "plot_helpers.plot_data(X_test[np.where(Y_test == -1)[0], 0], X_test[np.where(Y_test == -1)[0], 1], fig=fig, options=opt)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### The Perceptron Algorithm\n",
    "\n",
    "The perceptron loss is defined as: $$L(w; X, Y) =  \\sum_{i=1}^{N} L_p(w; x_i, y_i) = \\sum_{i=1}^{N} \\max \\{ 0, -y_i w^\\top x_i \\}.$$\n",
    "\n",
    "The loss function is continuous, but not differentialbe at $y_i w^\\top x_i=0$. The subgradient, however, exists and hence (stochastic) gradient descent converges. The subgradient is:\n",
    "\n",
    "$$ \\partial L_p(w; x_i,y_i) = \\left\\{\\begin{array}{cc} 0 & \\text{if } -y_i w^\\top x_i < 0 \\\\ -y_i x_i & \\text{if } -y_i w^\\top x_i > 0 \\\\ \\left[0, -y_i x_i \\right] & \\text{if } -y_i w^\\top x_i = 0 \\end{array}  \\right. $$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "rcParams['figure.figsize'] = (20, 5)\n",
    "rcParams['font.size'] = 16\n",
    "\n",
    "n_iter_widget = ipywidgets.IntSlider(value=20, min=5, max=100, step=1,\n",
    "                                     description='Number of iterations:', style={'description_width': 'initial'},\n",
    "                                     continuous_update=False)\n",
    "batch_size_widget = ipywidgets.IntSlider(value=1, min=1, max=X_train.shape[0], step=1,\n",
    "                                         description='Batch Size:', style={'description_width': 'initial'},\n",
    "                                         continuous_update=False)\n",
    "noise_widget = ipywidgets.FloatSlider(value=0.2, min=0, max=1, step=0.1, readout_format='.1f',\n",
    "                                     description='Noise:', style={'description_width': 'initial'},\n",
    "                                     continuous_update=False)\n",
    "\n",
    "def change_learning_params(n_iter, batch_size, noise):\n",
    "    np.random.seed(42)\n",
    "    X, Y = linear_separable_data(num_points, noise=noise, dim=2)\n",
    "    indexes = np.arange(0, 2 * num_points, 1)\n",
    "    np.random.shuffle(indexes)\n",
    "    num_train = int(np.ceil(2 * TEST_FRACTION * num_points))\n",
    "\n",
    "    X_train = X[indexes[:num_train]]\n",
    "    Y_train = Y[indexes[:num_train]]\n",
    "\n",
    "    X_test = X[indexes[num_train:]]\n",
    "    Y_test = Y[indexes[num_train:]]\n",
    "\n",
    "    classifier = Perceptron(X_train, Y_train)\n",
    "    classifier.load_test_data(X_test, Y_test)\n",
    "    \n",
    "    np.random.seed(42)\n",
    "    w0 = np.random.randn(3, )\n",
    "\n",
    "    opts = {'eta0': 1,\n",
    "            'n_iter': n_iter,\n",
    "            'batch_size': batch_size,\n",
    "            'n_samples': X_train.shape[0],\n",
    "            'algorithm': 'SGD',\n",
    "            'learning_rate_scheduling': None,\n",
    "            }\n",
    "    try:\n",
    "        trajectory, indexes = gradient_descent(w0, classifier, opts=opts)\n",
    "\n",
    "        contour_plot = plt.subplot(121)\n",
    "        error_plot = plt.subplot(122)\n",
    "\n",
    "        opt = {'marker': 'ro', 'fillstyle': 'full', 'label': '+ Train', 'size': 8}\n",
    "        plot_helpers.plot_data(X_train[np.where(Y_train == 1)[0], 0], X_train[np.where(Y_train == 1)[0], 1], fig=contour_plot, options=opt)\n",
    "        opt = {'marker': 'bs', 'fillstyle': 'full', 'label': '- Train', 'size': 8}\n",
    "        plot_helpers.plot_data(X_train[np.where(Y_train == -1)[0], 0], X_train[np.where(Y_train == -1)[0], 1], fig=contour_plot, options=opt)\n",
    "\n",
    "        opt = {'marker': 'ro', 'fillstyle': 'none', 'label': '+ Test', 'size': 8}\n",
    "        plot_helpers.plot_data(X_test[np.where(Y_test == 1)[0], 0], X_test[np.where(Y_test == 1)[0], 1], fig=contour_plot, options=opt)\n",
    "        opt = {'marker': 'bs', 'fillstyle': 'none', 'label': '- Test', 'size': 8}\n",
    "        plot_helpers.plot_data(X_test[np.where(Y_test == -1)[0], 0], X_test[np.where(Y_test == -1)[0], 1], fig=contour_plot, options=opt)\n",
    "\n",
    "        contour_opts = {'n_points': 50, 'x_label': '$x$', 'y_label': '$y$', 'sgd_point': True, 'n_classes': 4}\n",
    "        error_opts = {'epoch': 5, 'x_label': '$t$', 'y_label': 'error'}\n",
    "\n",
    "        opts = {'contour_opts': contour_opts, 'error_opts': error_opts}\n",
    "        plot_helpers.classification_progression(X, Y, trajectory, indexes, classifier, \n",
    "                                                contour_plot=contour_plot, error_plot=error_plot, \n",
    "                                                options=opts)\n",
    "    except KeyboardInterrupt:\n",
    "        pass\n",
    "    \n",
    "interact_manual(change_learning_params, n_iter=n_iter_widget,  batch_size=batch_size_widget,\n",
    "                noise=noise_widget);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## The SVM Algorithm\n",
    "\n",
    "The svm loss is defined as: $$L(w; X, Y) =  \\sum_{i=1}^{N} L_{\\text{svm}} (w; x_i, y_i) =  \\sum_{i=1}^{N} \\max \\{ 0, 1-y_i w^\\top x_i \\}.$$\n",
    "\n",
    "The loss function is continuous, but not differentialbe at $y_i w^\\top x_i=0$. The subgradient, however, exists and hence (stochastic) gradient descent converges. The subgradient is:\n",
    "\n",
    "$$ \\partial L_{\\text{svm}}(w;x_i,y_i) = \\left\\{\\begin{array}{cc} 0 & \\text{if } 1-y_i w^\\top x_i < 0 \\\\ -y_i x_i & \\text{if } 1-y_i w^\\top x_i > 0 \\\\ \\left[0, -y_i x_i \\right] & \\text{if } 1-y_i w^\\top x_i = 0 \\end{array}  \\right. $$\n",
    "\n",
    "The difference with the perceptron loss is that the SVM loss includes a loss margin. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "rcParams['figure.figsize'] = (10, 5)\n",
    "rcParams['font.size'] = 16\n",
    "reg_widget = ipywidgets.FloatSlider(value=-6, min=-6, max=3, step=0.5, readout_format='.1f',\n",
    "                                    description='Regularization 10^:', style={'description_width': 'initial'},\n",
    "                                    continuous_update=False)\n",
    "lr_widget = ipywidgets.FloatSlider(value=1, min=1e-1, max=2, step=1 * 1e-1, readout_format='.1f', \n",
    "                                   description='Learning rate:', style={'description_width': 'initial'},\n",
    "                                   continuous_update=False)\n",
    "n_iter_widget = ipywidgets.IntSlider(value=20, min=5, max=100, step=1,\n",
    "                                     description='Number of iterations:', style={'description_width': 'initial'},\n",
    "                                     continuous_update=False)\n",
    "\n",
    "batch_size_widget = ipywidgets.IntSlider(value=1, min=1, max=X_train.shape[0], step=1,\n",
    "                                         description='Batch Size:', style={'description_width': 'initial'},\n",
    "                                         continuous_update=False)\n",
    "noise_widget = ipywidgets.FloatSlider(value=0.2, min=0, max=1, step=0.1, readout_format='.1f',\n",
    "                                     description='Noise:', style={'description_width': 'initial'},\n",
    "                                     continuous_update=False)\n",
    "\n",
    "def change_learning_params(reg, eta0, n_iter, batch_size, noise):\n",
    "    np.random.seed(42)\n",
    "    X, Y = linear_separable_data(num_points, noise=noise, dim=2)\n",
    "    indexes = np.arange(0, 2 * num_points, 1)\n",
    "    np.random.shuffle(indexes)\n",
    "    num_train = int(np.ceil(2 * TEST_FRACTION * num_points))\n",
    "\n",
    "    X_train = X[indexes[:num_train]]\n",
    "    Y_train = Y[indexes[:num_train]]\n",
    "\n",
    "    X_test = X[indexes[num_train:]]\n",
    "    Y_test = Y[indexes[num_train:]]\n",
    "    \n",
    "    classifier = SVM(X_train, Y_train)\n",
    "    classifier.load_test_data(X_test, Y_test)\n",
    "    \n",
    "    regularizer = L2Regularizer(np.power(10., reg))\n",
    "    np.random.seed(42)\n",
    "    w0 = np.random.randn(3, )\n",
    "\n",
    "    opts = {'eta0': eta0,\n",
    "            'n_iter': n_iter,\n",
    "            'batch_size': batch_size,\n",
    "            'n_samples': X_train.shape[0],\n",
    "            'algorithm': 'SGD',\n",
    "            'learning_rate_scheduling': 'AnnealingSVM',\n",
    "            'reg': regularizer.get_lambda() / batch_size\n",
    "            }\n",
    "    try:\n",
    "        trajectory, indexes = gradient_descent(w0, classifier, regularizer, opts)\n",
    "\n",
    "        contour_plot = plt.subplot(121)\n",
    "        error_plot = plt.subplot(122)\n",
    "\n",
    "        opt = {'marker': 'ro', 'fillstyle': 'full', 'label': '+ Train', 'size': 8}\n",
    "        plot_helpers.plot_data(X_train[np.where(Y_train == 1)[0], 0], X_train[np.where(Y_train == 1)[0], 1], fig=contour_plot, options=opt)\n",
    "        opt = {'marker': 'bs', 'fillstyle': 'full', 'label': '- Train', 'size': 8}\n",
    "        plot_helpers.plot_data(X_train[np.where(Y_train == -1)[0], 0], X_train[np.where(Y_train == -1)[0], 1], fig=contour_plot, options=opt)\n",
    "\n",
    "        opt = {'marker': 'ro', 'fillstyle': 'none', 'label': '+ Test', 'size': 8}\n",
    "        plot_helpers.plot_data(X_test[np.where(Y_test == 1)[0], 0], X_test[np.where(Y_test == 1)[0], 1], fig=contour_plot, options=opt)\n",
    "        opt = {'marker': 'bs', 'fillstyle': 'none', 'label': '- Test', 'size': 8}\n",
    "        plot_helpers.plot_data(X_test[np.where(Y_test == -1)[0], 0], X_test[np.where(Y_test == -1)[0], 1], fig=contour_plot, options=opt)\n",
    "\n",
    "        contour_opts = {'n_points': 100, 'x_label': '$x$', 'y_label': '$y$', 'sgd_point': True, 'n_classes': 4}\n",
    "        error_opts = {'epoch': 5, 'x_label': '$t$', 'y_label': 'error'}\n",
    "\n",
    "        opts = {'contour_opts': contour_opts, 'error_opts': error_opts}\n",
    "        plot_helpers.classification_progression(X, Y, trajectory, indexes, classifier, \n",
    "                                                contour_plot=contour_plot, error_plot=error_plot, \n",
    "                                                options=opts)\n",
    "    except KeyboardInterrupt:\n",
    "        pass\n",
    "interact(change_learning_params, reg=reg_widget, eta0=lr_widget, n_iter=n_iter_widget,\n",
    "                batch_size=batch_size_widget, noise=noise_widget);"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  },
  "pycharm": {
   "stem_cell": {
    "cell_type": "raw",
    "metadata": {
     "collapsed": false
    },
    "source": []
   }
  },
  "widgets": {
   "state": {
    "1dbcd274dcfe470b91708912892ed69f": {
     "views": [
      {
       "cell_index": 18
      }
     ]
    },
    "2cfdd1a03116452aa5629c3d90011059": {
     "views": [
      {
       "cell_index": 13
      }
     ]
    },
    "4e37636106d041fab7a6ff19b8cb36c7": {
     "views": [
      {
       "cell_index": 15
      }
     ]
    },
    "674c84c930124a6f95d134bf447ea1c1": {
     "views": [
      {
       "cell_index": 22
      }
     ]
    },
    "85cd0d5f4f1e4750ab3b284e95bc5242": {
     "views": [
      {
       "cell_index": 24
      }
     ]
    },
    "a44fb013255548a7b20262200fd53b86": {
     "views": [
      {
       "cell_index": 20
      }
     ]
    }
   },
   "version": "1.2.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}